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Abstract — Binary machines are a generalization of Feedback 
Shift Registers (FSRs) in which both, feedback and feedforward, 
connections are allowed and no chain connection between the 
register stages is required. In this paper, we present an algorithm 
for synthesis of binary machines with the minimum number of 
stages for a given degree of parallelization. Our experimental 
results show that for sequences with high linear complexity 
such as complementary, Legendre, or truly random, parallel 
binary machines are an order of magnitude smaller than parallel 
FSRs generating the same sequence. The presented approach can 
potentially be of advantage for any application which requires 
sequences with high spectrum efficiency or high security, such as 
data transmission, wireless communications, and cryptography. 

Index Terms — Feedback shift register, sequences, nonlinear 
complexity 

I. Introduction 

In information theory, it is known that any binary sequence 
with a finite period can be generated by a binary machine 
shown in Figure [T] (T) . An n-stage binary machine consists of 
an n-stage binary register, n updating Boolean functions, and 
a clock. At each clock cycle, the current values of all stages 
of the register are synchronously updated to the next values 
computed by the updating functions. Binary machines can be 
viewed as a more general version of Feedback Shift Registers 
(FSRs). 

Suppose we would like to construct a binary machine which 
generates the following binary sequence: 

A 2 = (0,0,1,1,0,1,1,1,0,0,1,0,1,1,1,0,1,1,0,0). 

Since the output of a binary machine equals to the least 
significant bit of its current state, any assignment of states 

51 = (so,si,. ■ ■ 7S19) such that mod 2 = a,- results in a binary 
machine with generates A. For example, we can use 

5 2 = (0,2,1, 3, 4,5, 7,9,6,8,11, 10, 13, 15, 17,12, 19,21, 14, 16) 

where even and odd integers are assigned in an increasing 
order. From S2 we can easily see how many stages a binary 
machine should have to generate A 2 . The largest element of 
S2 is 21. We need 5 bits to expand it in binary. Thus, a binary 
machine generating A 2 should have at least 5 stages. 

As in the case of traditional Finite State Machines (FSM) 
synthesis [2], for different state assignments we usually get 
different next state functions. The circuit complexity of these 
functions may vary substantially for different state assign- 
ments. We can also use the one-hot encoding instead of the 
binary one. Then, the number of stages will increase, but the 
complexity of functions might decrease in some cases. 

Next we describe an intuitive idea behind the algorithm for 
synthesis of parallel binary machines presented in this paper. 
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Fig. 1 . A binary n-stage machine with the degree of parallelization one. 



Suppose that we use the encoding (00) = 0, (01) = 1, (10) = 2, 
(11) = 3 to encode the binary sequence A 2 from the example 
above into the following quaternary sequence: 

A 4 = (0,3,1,3,0,2,3,2,3,0). 

We can construct a quaternary machine generating A4 (in 
which the stages of the register can store 4 different values and 
the updating functions are 4-valued) by choosing a sequence of 
states 64 = (sq,s\ , ... ,59) such that Sj mod 4 = a,-. For example, 
we can assign the states as follows: 

5 4 = (0,3,1,7,4,2,11,6,15,8). 

Note that the largest element of 54 is 15. We need 2 quaternary 
digits to represent it. Thus, we can generate A4 using a 
quaternary machine with 2 stages (see Figure HJa)). Such a 
quaternary machine can, in turn, be converted into a binary 
machine by encoding each 4-valued function by a pair of 
Boolean functions and by replacing each quaternary stage 
by two binary stages (see Figure |4{b)). The resulting 4-stage 
binary machine generates the same binary sequence A2 as in 
the example above, but two bits per clock cycle. Note, that is 
the example above we needed 5 stages to generate A2 one bit 
per clock cycle. So, we constructed a parallel binary machine 
which has fewer stages than the theoretical lower bound on 
the number of stages in a binary machines generating the same 
sequence sequentially bit by bit. 

Later in the paper, we show that the number of stages 
can be reduced even further by using the 8-ary encoding. 
What is even more important, we reduce not only the number 
of stages, but also the circuit complexity of the updating 
functions. Our experimental results show that for sequences 
with high linear complexity such as complementary, Legendre, 



or truly random, parallel binary machines are an order of 
magnitude smaller than parallel FSRs generating the same 
sequence. Therefore, the presented approach can potentially 
be useful for any application which requires sequences with 
high spectrum efficiency or high security. Such applications 
include data transmission, wireless communications, cryp- 
tography, and many others J3], H, 0, 0. A particularly 
attractive application is encryption and authentication systems 
for smartcards and Radio Frequency IDentification (RFID) 
tags. A low-cost RFID tag can spare only a few hundred 
gates for security functionality [7|. None of the available 
cryptographic systems satisfies this requirement at present (8). 

The rest of the paper is organised as follows. Section [TT] 
describes basic notation and definitions used in the sequel. In 
Section IIVI we present an algorithm for constructing an Al- 
ary machine with the minimum number of stages generating 
a given ra-ary sequence. In Section [V] we show how m- 
ary machines can be encoded to generate binary sequences 
in parallel and demonstrate that such an encoding can be 
of advantage. Section [VI] presents the experimental results. 
Section [VTTl concludes the paper. 

II. Preliminaries 

Let M — {0,1,..., m— 1}. An m-ary sequence is vector 
A m = (ao,a\, ...,) where a,- 6 M for all ; > 0. 

If there exist k > and ko > such that a, = a, + ^ for all 
i > ko, then A is called eventually (or ultimately) periodic. If 
ko = 0, then A is called purely periodic, or simply periodic. 
The least integers ko and k with this property are called pre- 
period and period of the sequence, respectively J9). 

For a multiple-valued function / : M" — > M, the i-set of f 
is defined by iflOl 

i-set(f) = {xGM n :f(x)=i}. 

In the binary case, 0-set and 1-set correspond to off-set and 
on-set of /, respectively ifTTl . 

An m-ary n-stage machine consists of n m-ary storage 
elements, called stages. Each stage i £ {0, 1, ...,«— 1} has an 
associated state variable x, G M which represents the current 
value of the stage i and an updating function f : M" — >• M 
which determines how the value of x\ is updated. 

A state of an n-stage machine is a vector of values of 
its state variables. At every clock cycle, the next state of a 
machine is determined from its the current state by updating 
the values of all stages simultaneously to the values of the 
corresponding /,'s. 

The degree of parallelization of an n-stage machine is the 
number of stages p, 1 < p < n, which are used to produce its 
output at each clock cycle. 

III. Previous Work 

For the case of Linear FSRs (LFSRs), there are two main 
approaches to constructing an LFSR with the degree of par- 
allelization p: (1) synthesis of subsequences representing p 
decimation of some phase shift of the original LFSR sequence 



and (2) computation of the set of states reachable from any 
state in p steps. 

Let S be a sequence produced by an LFSR whose charac- 
teristic polynomial g(x) of degree n is irreducible in GF(2). 
Let a be a root of g(x) and let T be the period of S. In the 
method based of synthesis of subsequences 1121 . the sequence 
S is decomposed into p subsequences S J P , each representing a 
p decimation of j'th phase shift of S. In other words, the ith 
element of Sj, is equal to i • p + j element of S. By Zierler's 
theorem iTPD . for < j < p, the subsequences Sj, can be 
generated by an LFSR with the following properties: 

• The minimum polynomial of a'' in GF(2") is the char- 
acteristic polynomial q*(x) of the new LFSR which has: 

- Period T* = T/gcd(d,T), 

- Degree n*, which is the multiplicative order of 2 in 
Z(T*). 

The Berlekamp-Massey algorithm fBl or its generaliza- 
tions [15 1 can be used to find the smallest LFSR for each 
subsequence Sj,. The size of each LFSR is «*, which is at most 
n, i.e. the overall number of bits in p LFSRs is at most p x n. 
This method is applicable to any degree of parallelization p 
which is not a multiple of the period T. 

The second approach is based on computing the set of 
states reachable from any state in p steps. This is usually 
done by computing pt\\ power of the connection matrix of 
the LFSR lfl6l . ifTTll . Such an approach is applicable to the 
degrees of parallelization 1 < p <n. The size of the register 
with the degree of parallelization p in this case is the same as 
the size of the original LFSR, n. 

For the case of Non-Linear FSRs (NLFSRs), algorithms for 
finding a shortest NLFSR generating a given binary sequence 
have been presented in |T8), DU, ED, and 0. An NLFSR 
with the degree of parallelization p can be constructed by 
computing the set of states reachable from any state in p 
steps, as in the approach (2) for LFSR. This can be done by 
computing pt\\ power of the transition relation of the NLFSR. 
However, the size of /9th power of the transition relation of 
an NLFSR usually grows much faster than in the LFSR case. 
Therefore, in practice, in applications which use NLFSRs with 
the degree of parallelization p, NLFSRs are selected so that 
variables of the p left-most stages of the NLFSR are not used 
in the updating functions. In such a case, an NLFSR with the 
degree of parallelization p can be constructed by duplicating 
the updating functions p times ll2~Tll . 11221 . Il23l . 

For binary machines with the degree of parallelization 
one, an algorithm for constructing a shortest binary machine 
generating a given binary sequence has been presented in fl24l . 

IV. Synthesis Algorithm 

The algorithm presented in this section exploits the property 
of nz-ary n-stage machines that any m-ary n-tuple can be the 
next state of a given current state. Note that, in the traditional 
n-stage NLFSRs in the Fibonacci configuration (TJ, the next 
state overlaps with a current state in n — 1 positions. NLF- 
SRs in the Galois configuration are more flexible. However, 



Algorithm 1 Construct an m-ary machine which generates 
an m-ary sequence A = {ao,a\ , . . . ,0^) with the degree of 
parallelization one. 
1 : for every i from to m — 1 do 

2: TV; := 0; /*counts the number of digits with value i G M*l 
3: end for 

4: for every j from to k — 1 do 

5: N aj :=N aj + l; 

6: end for 

7: Nmax := max ieM Ni 

8: for every / from to m 1 do 

9: B, := 

10: for every j from to AWr — 1 do 
ii: B, :=B{\J {j *m + i}\ 
12: end for 
13: end for 

14: for every i from to m — 1 do 

15: fi, := [b;.o,fr,-.i , . . . ,bj.N„ m -i] is an arbitrary permutation 
of B,; 

16: r,- := 0; /*records how many elements of B, were used*/ 
17: end for 

18: for every j from to k— 1 do 

19: s; := ,„ ; l*b ai ,„ is the r„,th element of B„.*l 

20: r a .:=r aj + l; 

21: end for 

22: n = \log m N mca ~] +1; 

23: for every j from to k — 1 do 

24: Expand Sj as an m-ary vector Sj := 

(^Jn- 1 ' S jn-2 1 ■ ■ ■ ' s jo ) > 

25: end for 

/*The resulting sequence 5= (sq,s\,. . . ,Sk-i) is inter- 
preted as a sequence of states of an m-ary n-stage ma- 
chine*/ 

26: for every p from to « — 1 do 
27: for every / from to m — 1 do 
28: i-set(f p ) = 0; 
29: end for 
30: end for 

31: for every j from to k— 1 do 
32: for every p from to n — 1 do 

33: i = *0+l),; 

34: j-*e/(/p ) = i-*ef(/;, ) U { (s,„_ , , s jn _ 2 ,...,s jo )}; 
35: end for 
36: end for 

37: Return (/ ,/i,...,/„_i); 



since they do not allow feedforward connections, their set of 
possible next states is still restricted to a certain subset of all 
possible states l25l . 

The input of the algorithm is an m-ary sequence A of 
length k. First, we show how to construct a sequence of 
integers S= (so,si, . . . ,Sk-i) such that sj mod m = aj for all 
j G {0, 1, . . . ,k — 1}. We count the number of occurrences of 
each of digits with the value i £ M in A, Ni, and determine 



the largest number of occurrences, N max = max^MNi- 

Let B, be a set consisting of N„ mx non-negative integers of 
type j ■ m + i for all j G {0, 1, . . . , N max — 1 } and all i G M. Let 
Bi = [bifljbi^, . . . Ajv,,™— 1] be an arbitrary permutation of B,-. 

Initially, for all i G M, we set to zero a counter r, which 
counts how many digits of B, have been used. Then, for every 
j from to k— 1, we take the j'th element of the sequence 
A, aj, and assign Sj to r„^th element of B Uj . It is easy to see 
from our construction that sj mod m is equal to a,. 

Let 5 = (soi-Si) • • • i-Sjfc-i) be a sequence constructed as de- 
scribed above. Each integer s, 6 5 can be represented as an m- 
ary expansion , s, n 2 , . . . , s, ) G M" where « is the number 
of m-ary digits needed to represent the largest integer of S and 
si is the least significant digit of the expansion. We interpret 
each n-tuple (sj n _ l , Sj n _ 2 , . . . ,s, ) as a state of an m-ary ra-stage 
machine. By construction, s, = a,- for all i G {0, l,...,k—l}. 

Next, we define a mapping s, h-> s, + i , for all i G {0, 1, . . . 
1}, where "+" is mod k. This mapping assigns sj+i to be the 
next state of a current state s; of an m-ary «-stage machine. 
Each of m" — k remaining states of the m-ary «-stage machine 
are left unspecified. This gives us a freedom to specify the 
updating functions in a way which minimizes their circuit 
complexity. 

The /-sets of the updating functions implementing the re- 
sulting mapping are derived as follows. Initially i-set(fj) = 0, 
for all j G {0, 1 ,...,« — 1 } and all i G M. For every j from 
to k— 1, and every from to n— 1, if ■sq+i) 7^ 0, where 
"+" is mod then we add {sj n _ x ,sj n _ 2 , s j ) to the /-set of 
f P where i = j y+1)p . 

The algorithm described above is summarized as Algo- 
rithm Q] Its worst-case time complexity is 0(n-k) (assuming 
k > m which is normally the case). 

Theorem 1: The Algorithm Q] constructs an m-ary n-stage 
machine generating an m-ary sequence A of length k with the 
degree of parallelization one where « is given by 

n=\log m N max ]+l, (1) 

where N max = max^M^i- 

Proof: At the step 7 of the Algorithm [T] for each / G M, Af,- 
equals to the number of digits with the value / in the sequence 
A. From the step 6 of the Algorithm [T] we can conclude that, 
for each i 6 M, the largest integer s, G S such that s, mod 
m = / is equal to m(Nj — 1) + /. We need \log m Nf\ + 1 m-ary 
digits to express this integer for any Ni > 0. Since k > 1, the 
number of stages in the m-ary n-stage machine is given by 
\logmNmox] + 1 where N nmx = max ieM Ni. 

□ 

The Lemma below shows under which conditions that the 
bound given by (Q]i is an exact lower bound. 

Lemma 1: Given a purely periodic m-ary sequence A m with 
the period k, any m-ary machine which generates A m the 
degree of parallelization one has at least n stages, where n 
is given by (UJ. 

Proof: The existence of an m-ary machine with n = 
\log m Nmax~\ + 1 stages which can generate A m follows from 



the Theorem Q] It remains to prove that no m-ary n'-stage 
machine with n' < n can generate A m . 

Assume that such a machine exists. Then, if A m is purely 
periodic and has the period k, to be able to generate one digit 
of A m per clock cycle with the period k, the m-ary n'-stage 
machine must have at least iVj distinct states whose Othe stage 
has the value i. We need at least \log m N{\ + 1 m-ary stages to 
implement the largest of these states for any Nj > 0. So, we 
can conclude that n' > \log m N max ] + 1 which contradicts the 
assumption that n' < n. 

□ 

As an example, consider the 4-ary sequence from the 
Introduction section: 

At = (0,3, 1,3, 0,2,3, 2,3,0). 

We have N max = 4. So: 

Bo = {0,4,8,12}, 
Bi = {1,5,9,13}, 
B 2 = {2,6,10,14}, 
B 3 = {3,7,11,15}. 

Suppose we use following permutations of B,s: 



Then we get: 



Bo = [0,4,8,12], 
Bi = [1,5,9,13], 
2? 2 = [2,6,10,14], 
B 3 = [3,7,11,15]. 



5 4 = (0,3,1,7,4,2,11,6,15,8) 



Since Nmax = 4, from the Theorem Q] we can conclude that 
the quaternary machine which generates A has 2 stages. By 
applying the mapping described in the Algorithm Q] to S, we 
get the following /-sets for the updating functions /o and f\: 

0- ^/0 = {(00), (03), (10), (20)} 

1- ^/0 = {(01), (13), (23)} 

2- ^/0 = {(02), (33)} 

3- «f(/i) = {(22)} 

0- «f(/b) = {(13),(20),(33)} 

1- set(fo) = {(03)} 

2- ^(/ ) = {(10),(23)} 

3- j</b) = {(00),(01),(02),(12)}. 

The defining tables of these functions are shown is Figure [2] 
The symbol "-" stands for a don't care value. 

Note that, in Lemma Q] we require that A is purely periodic 
with the period k. The need for the latter condition is obvious: 
if A repeats two or more times within the input sequence 
length k given to the Algorithm Q] then we need less than eq. 
(Q3 stages to generate A. The former condition is necessary 
because, in the sequence is eventually periodic, we might be 
able to generate is with a binary machine with less than eq. 
([T]i stages. As an illustration, consider an eventually periodic 
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Fig. 2. Defining table for the updating functions of the 4-ary 2-stage machine 
in Figure |4}a). The symbol "-" stands for a don't care (unspecified) value. 
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Function /n(-too,.toi,*io,*ii) 
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Function /io(-*oo,*Ol,*lO,xil) Function /oo(*00,-*0l,-tl0,xil) 

Fig. 3. Defining tables for the updating functions of the binary 4-stage 
machine in Figure |4}b) for the case when all don't cares are specified to 0. 
The pairs (/n,/io) and (/oi,/oo) encode the 4-valued functions f\ and fo in 
Figure [2] respectively. 



binary sequence (1,1,0,0,1,0,1,0,1) with pre-period 3 and 
period 2. By using Algorithm [T] we can construct a binary 
machine with 4 stages which repeats this sequence with the 
period 9. However, we can also construct a binary machine 
with 3 stages whose state transition graph has a cycle of 
length 2, corresponding to the period (0,1) and has a branch 
implementing (1,1,0) which leads to the cycle. In some cases, 
the binary machine constructed by the latter approach might 
be smaller than the one constructed using the Algorithm Q] 

V. Generation of Binary Sequences 

We can use m-ary n-stage machines for generating binary 
sequences by encoding their m-ary stages and m-valued func- 
tions using at most ( \l0g2m} ■ n) binary stages and Boolean 
functions. 

An an example, consider the quaternary 2-stage machine 
from the example in the previous section. Figure @}a) shows 
its quaternary implementation. Figure 21b) shows the same 
machine in which the updating functions fo and f\ are encoded 
by a pair of Boolean functions (/jo,/»i), i £ {0, 1}, using the 
encoding = (00), 1 = (01), 2 = (10), 3 = (11). The defining 
tables for the Boolean functions are shown in Figure [3] We 
specified all don't cares of fo and f\ to 0. The resulting binary 
4-stage machine generates the following sequence A2 two bits 
per clock cycle: 

A 2 = (0,0,1,1,0,1,1,1,0,0,1,0,1,1,1,0,1,1,0,0). (2) 

As we showed in the Introduction, if instead of using 
quaternary encoding, we use Algorithm [T]to construct a binary 
machine for Ao directly , we get No~9 and A^i — 11 and thus 
a machine with « = [Zo^H] +1=5 stages. 




(a) (b) 



Fig. 4. (a) A quaternary 2-stage machine with the degree of parallelization 
one. (b) The machine from (a) encoded as a binary 4-stage machine with the 
degree of parallelization two. 



Let us see whether we can reduce the number of stages even 
more is we use 8-are encoding. We group the bits of A2 in 
triples to get the following 8-ary sequence: 

A 8 = (1,5,6,2,7,3,0). 

Note that we have added an extra to A2 to make its length 
a multiple of 3. Using the Algorithm Q] we can derive the 
following sequence of integers S% = (so,si, . . . ,sj) such that 
sj mod 8 = dj for all j € {0, 1, . . . ,7}: 

S 8 = (1,5,6,2,7,3,0). 

As we can see, S = As, because none of the digits of Ag repeat 
more than once. By the Theorem Q] we need n — \log%\\ + 
1 = 1 stage to implement this sequence by an 8-ary machine. 
The updating function of this machine is defined is Figure [5] 
By encoding the 8-ary 1 -stage machine in binary, we get a 
binary 3-stage machine with the updating functions defined in 
Figure [6] which generates three bits of A2 per clock cycle. So, 
we gained one more stage by using the 8-ary encoding. 

Before presenting the main result of the paper, let us 
formally define m-ary encodings. 

Definition 1: For m = 2 P , p > 0, an m-ary encoding of a 
binary sequence A2 of length k is the m-ary sequence A m 
of length \k/p] which is obtained from A2 by replacing 
the consecutive p-tuples of bits of A2, (a,-,a,-+i, . . . ,a, +p _i), 
i E {0,p,2p,..., \k/p]}, by the value ai-m v ~ x + a i+ \ -m p ~ 2 + 
... + fli+p-i -m°. If k 1 mod p ^ 0, then the length of A is 
extended to the minimum k' such that k' mod p = and 
k' > k. The appended bits are chosen so that the resulting 
Nmax = maxieMNi is minimum. 

The following theorems gives the lower bound on the 
number of stages in binary machine with the degree of 
parallelization p. 

Theorem 2: Let A2 be a purely periodic binary sequence 
with the period k. Any binary machine which generates A2 
with the degree of parallelization p > 1 has at least n stages, 
where n is given by: 

n = \log2Nmax] + p 

where N max = max^M^i and Nj is to the number of digits with 
the value i in the m-ary encoding of A2, m = 2 P . 
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Fig. 5. Defining table for the updating function fo of the 8-ary 1 -stage 
machine from the example. 



Proof: Let m = 2 P where p is the degree of parallelization, 
p > 0. From the step 6 of the Algorithm [TJ we can conclude 
that, for each 2 £ M, the largest integer s, 6 S such that Sj mod 
m = i is equal to m(Nj — 1) + i. We need \l0g2Nj~} + p binary 
digits to express this integer for any Nj > Therefore, for 
k > 1, the number of stages in the binary n-stage machine is 
at most n < \log2N ma x] + P where N max = max^M^i- 

To be able to generate p bits of A2 per clock cycle, 
the binary n-stage machine must have at least Nj distinct 
states whose p lest significant bits correspond to the binary 
encoding of the value 2. If A2 is purely periodic with the 
period k, we need at least \l0g2Ni] + p binary stages to 
implement the largest of these states for any Nj > 0. Therefore, 
n > \log 2 Ni\ + p. 

So, we can conclude that n = \log2N max ~\ + p. 

□ 

The technique presented above opens a new possibility 
for increasing the throughout of FSR-based binary sequence 
generators. As we mentioned in Section [HI] at present, the 
generation of /j-bits of a sequence per clock cycle is usually 
achieved by duplicating the combinatorial logic implementing 
updating functions of the FSR p times l2T1 . l22l . l23l . 

As an example, consider the sequence A2 given by (fJJ. 
According to the Example V. 1 in [9 (JJ, the shortest non-linear 
FSR in the Fibonacci configuration which can generate A2 has 
7 stages and the following updating function of the stage 6: 

fe = xqx\ (Bxqxi © xoxi ©X0X1X2X3 ©X0X1X2X3 

©X0X1X2X3 ffixox 1x2x3x4x5X6 0x0x1x2x3x4x5X6. 

The updating functions of the remaining stages of the NLFSR 
are of type /, =xi+u for 2' S {0,1,... ,5}. If we use the number 
of 2-input XORs and ANDs as a measure of cost, then the cost 
of f 6 is 24 ANDs + 7 XORs. 

On the other hand, as shown above, we can generate 3-bits 
of A2 per clock cycle using the 3-stage binary machine with 
the updating functions defined in Figure [6] We can express 
these functions as follows: 

foi =xooxoi ©xooxoixo2 
/01 =*bo*oi ©X00X02 

/00 = ^02 0X00X01 ■ 

In total, fo2, /01 and /bo have 6 AND and 3 XORs. So, the cost 
of generating 3 bits of A2 per clock cycle using this binary 
3-stage machine is 3 binary stages of a register + 6 ANDs + 
3 XORs. 

Too make a crude comparison of the two costs, let us assume 
that the costs of the 2-input AND and the 2-input XOR are 1, 

'The sequence in the Example V. 1 in (9| does not contain the last bit of 
A2, but this does not change the updating functions of the NLFSR. 
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Fig. 6. Denning tables for the updating functions (/o2i/oii/oo) representing 
the binary encoding of the 8-valued function in Figure \5\ for the case when 
the don't care is specified to 0. 



and the cost of one stage of a register is 2. Then, the cost of 
the NLFSR is 45, while the cost of the binary machine is 15. 
So, the binary machine is not only 3 times faster, but also 3 
times smaller. 

VI. Experimental Results 

To evaluate the presented approach, we compared the areas 
of binary machines, LFSRs and NLFSRs generating the same 
sequence for 3 types of sequences: truly random, complemen- 
tary, and Legendre. All experiments were run on a PC with 
Intel dual-core 1.8 GHz processor and 2 Gbytes of memory. 
The area was computed using ABC synthesis tool ||26l by first 
optimizing the circuits with resyn script and then by mapping 
them with map. In the results reported below, 1 unit of area 
is equal to the area of a 2-input NAND gate. 

In the first set of experiments, for each n in the range 4 < 
n < 16, we generated 20 truly random sequences of length 2" 
using the method 12711 . Columns 2-4 of Table U show the areas 
of the resulting LFSRs, NLFSRs and binary machines (BM) 
for the degree of parallelization one. Columns 5-7 of Table J] 
shows similar results for the degree of parallelization equal 
to the number of stages in binary machines (which is always 
less or equal to the number of stages in LFSRs and NLFSRs). 
Each entry is an average for 20 sequences. 

LFSRs are quite bad for generating truly random se- 
quences]^ The number of their stages grows roughly as a 
half of the sequence length. For NLFSRs, the number of 
stages grows much slower. However, the combinatorial area 
of parallel NLFSRs grows so fast that they become hard to 
synthesize for random sequences longer than 256 bits. As we 
can see from Table [I] on average, the area of parallel binary 
machines is an order of magnitude smaller than the area of 
parallel LFSRs and NLFSRs. 

Table [TT] shows the results for complementary sequences. 
Complementary sequences are a pair of sequences whose 

2 Note that there is a subset of pseudo-random sequences, called m- 
sequences, for which LFSRs are extremely efficient. An fi-stage LFSR with 
a primitive polynomial of degree n generates an m-sequence of length 2" — 1 . 
If the primitive polynomial has k non-zero terms, then to implement such an 
LFSR with the degree of parallelization p, we need n stages and no more 
than k * p XORs. However, due to the linearity of LFSRs m-sequences they 
are easy to reconstruct from a short segment. 



aperiodic autocorrelation coefficients sum up to zero 11281 . 
These sequences are known to have a tightly low peak-to-mean 
envelope power ratio, good error detection capabilities, and 
high nonlinearity J4). They are recommended for orthogonal 
frequency division multiplexing |4] and for multicarrier code 
division multiple access systems |5] We can see that, on 
average, parallel binary machines are an order of magnitude 
smaller than parallel LFSRs and NLFSRs. 

Tablelnllshows the results for extended Legendre sequences. 
Extended Legendre sequences are known to have the asymp- 
totic merit factor of 6.3421, which is the highest of all known 
families of sequences of an arbitrary length [6 |. The higher the 
merit factor of a sequence which is used to modulate a signal, 
the more uniformly the signal energy is distributed over the 
frequency range. This is important for spread-spectrum com- 
munication systems, ranging systems, and radar systems (0, 
(6). Again, on average, parallel binary machines are an order 
of magnitude smaller than parallel LFSRs and NLFSRs. 

VII. Conclusion 

In this paper, we present a method for constructing binary 
machines with the minimum number of stages for a given 
degree of parallelization. Our experimental results show that, 
for sequences with high linear complexity, such as comple- 
mentary, Legendre, or truly random sequences, parallel binary 
machines are an order of magnitude smaller than parallel 
LFSRs and NLFSRs generating the same sequence. 

Our results can be beneficial for any application which 
requires sequences with high spectrum efficiency or high 
security, such as data transmission, wireless communications, 
and cryptography. 
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TABLE I 

Area results for random sequences (average for 20 SEQUENCES); '-' STANDS for time out to compute THE RESULT (15 MIN). 
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TABLE II 

Area results for complementary sequences; '-' stands for time out to compute the result (15 min). 
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TABLE III 

Area results for extended Legendre sequences; '-' stands for time out to compute the result (15 min). 
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